pref: Enable mgpu in FrameView#5514
Conversation
Greptile SummaryThis PR lifts the
Confidence Score: 5/5Safe to merge; the device-guard removal is well-scoped, the write-path refactor correctly eliminates the double PrepareForReuse, and the CI pre-flight ensures multi-GPU coverage cannot silently regress. The core change (passing self._device directly to SelectPrims) is minimal and the surrounding refactor is correctness-improving. The only outstanding note is a stale one-line docstring that still advertises the old device restriction; everything else in the diff is either a genuine fix or well-tested new coverage. The init docstring in fabric_frame_view.py still lists the old 'cpu' or 'cuda:0' restriction and should be updated to match the new behavior. Important Files Changed
Reviews (4): Last reviewed commit: "Address review feedback on the multi-GPU..." | Re-trigger Greptile |
a6cd73e to
2c619fe
Compare
There was a problem hiding this comment.
Review Summary
Clean, well-motivated PR that removes an artificial device restriction. The refactoring of set_world_poses and set_scales into a single _compose_fabric_transform helper is a nice improvement — it ensures PrepareForReuse is only called once per logical update regardless of which components are being set.
Strengths
- Single kernel launch for combined updates —
_sync_fabric_from_usd_oncenow does pos+orient+scale in one pass instead of two separate calls - assert → RuntimeError — Topology guard won't be optimized away with
python -O - Thorough multi-GPU test coverage with proper skip/fail semantics depending on CI vs local
- CI workflow integration — dedicated
test-fabric-multi-gpujob triggered by relevant file changes - sys.argv stripping — prevents Kit segfault from pytest flags
Minor Notes
See inline comments.
There was a problem hiding this comment.
🔍 Code Review for PR #5514: Multi-GPU FabricFrameView Enablement
Reviewing changes since last review (merge from develop b3b770d8 → c3798134)
✅ Overall Assessment
This PR cleanly enables FabricFrameView on non-primary CUDA devices, unblocking distributed training scenarios. The implementation is well-structured with appropriate test coverage.
🎯 Key Findings
1. Excellent Refactoring: Single Write Path ✅
The consolidation of set_world_poses and set_scales into the shared _compose_fabric_transform helper is a significant improvement:
- Reduces code duplication from ~40 lines to a single method call
- Ensures
PrepareForReuseis invoked exactly once per logical update - Properly handles the initial USD→Fabric sync in one atomic operation
def _sync_fabric_from_usd_once(self) -> None:
# Now combines all components in one write — clean!
self._compose_fabric_transform(
positions=positions_usd_ta.warp,
orientations=orientations_usd_ta.warp,
scales=scales_usd,
)2. Proper Assert→RuntimeError Migration ✅
The change from assert to RuntimeError for topology-change detection is correct and necessary:
if self.count != self._default_view_indices.shape[0]:
raise RuntimeError(
f"Prim count changed ({self.count} vs {self._default_view_indices.shape[0]}). "
"Fabric topology change added/removed tracked prims — full re-initialization required."
)This ensures the invariant check survives python -O optimization, preventing silent corruption.
3. Test Coverage is Thorough ✅
The three new multi-GPU tests cover the critical paths:
test_fabric_cuda1_world_pose_roundtrip— validates the primary write pathtest_fabric_cuda1_no_usd_writeback— confirms Fabric isolation from USDtest_fabric_cuda1_scales_roundtrip— covers the secondary write path
The _skip_if_unavailable helper correctly uses pytest.skip (not pytest.fail) for graceful CI handling.
4. CI Configuration is Well-Designed ✅
The new test-fabric-multi-gpu job in .github/workflows/test-multi-gpu.yaml:
- Targets the correct runner labels:
[self-hosted, linux, x64, gpu, multi-gpu] - Triggers on changes to the relevant files
- Uses the
multi_gpumarker filter to run only the appropriate tests
💡 Minor Suggestions (Non-Blocking)
-
Documentation String Enhancement (optional)
The
_compose_fabric_transformdocstring could mention that the method handles the_fabric_usd_sync_doneflag update, clarifying the state machine:def _compose_fabric_transform(...): """... Also sets ``_fabric_usd_sync_done = True`` after successful write, indicating that Fabric and USD are now out of sync (Fabric is authoritative). """
-
Changelog Fragment ✅
The changelog at
source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rstis well-written and accurately describes both the fix and the changed behavior.
🔒 Safety Checklist
- No breaking changes to public API
-
cuda:0behavior unchanged - Error messages are clear and actionable
- Tests guard against regressions
- CI properly configured for multi-GPU validation
📋 Summary
| Aspect | Status |
|---|---|
| Code Quality | ✅ Clean |
| Test Coverage | ✅ Comprehensive |
| Documentation | ✅ Complete |
| CI Integration | ✅ Proper |
| Breaking Changes | ✅ None |
Recommendation: This PR is ready for merge. The latest changes are only merge commits from develop with no substantive code changes since the last review.
Note: This PR depends on #5380 — ensure that PR is merged first.
ada2608 to
c1254cd
Compare
🤖 Isaac Lab Review Bot — Follow-up Review (c1254cd)All previously raised concerns have been addressed ✅ Resolved Issues from Previous Review:
Summary of Additional Changes:Version bumps:
Dependency updates:
Bug fixes:
New features:
|
- Allow FabricFrameView to run on cuda:N for any N; USDRT SelectPrims no longer needs cuda:0. - Refactor the Fabric write path into a single _compose_fabric_transform helper shared by set_world_poses, set_scales, and the initial USD->Fabric sync, collapsing the sync to one kernel launch with one PrepareForReuse. - Replace the topology-invariant assert with RuntimeError so it survives python -O. - Add multi_gpu pytest marker plus cuda:1 unit-test coverage for both Fabric write paths, and run them in the existing test-multi-gpu CI job (one extra step, no new job).
The standard pytest invocation in CI runs the fabric test file without filtering on the ``multi_gpu`` marker, so the ``cuda:1`` tests get scheduled on every runner including the single-GPU ones. Previously ``_skip_if_unavailable`` hard-failed via ``pytest.fail`` whenever ``GITHUB_ACTIONS=true`` and the requested device was missing, on the theory that this would catch a misconfigured multi-GPU runner. In practice it just broke the standard CI: the dedicated ``test-fabric-multi-gpu`` workflow already pre-flights ``torch.cuda.device_count() >= 2`` before invoking pytest, so a genuinely misconfigured multi-GPU runner is already caught there. Always skip rather than fail when the requested ``cuda:N`` index isn't available. Drop the now-unused ``import os``.
Kit's CLI parser reads sys.argv directly at startup and segfaults on
pytest flags that collide with its own short options. Running
pytest -m multi_gpu source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
crashes during collection because Kit sees ``-m multi_gpu`` and exits
with ``Ill formed parameter: -m`` followed by SIGSEGV (exit code 245)
inside ``simulation_app._start_app``.
Strip sys.argv to argv[0] before instantiating AppLauncher. The test
file takes no CLI arguments of its own, mirroring the broader pattern
used by ``test_tiled_camera_env.py`` which assigns
``sys.argv[1:] = args_cli.unittest_args`` after argparse.
wp.to_torch on a ProxyArray is deprecated in favor of the .torch accessor. Switch the three call sites that consume the ProxyArray returned by get_world_poses; leave get_scales call sites alone since that method still returns a raw wp.array (no .torch accessor).
- Add a GPU-count pre-flight step to the test-fabric-multi-gpu CI job so a runner regression to a single GPU fails the workflow instead of silently skipping every cuda:1 test. This is what the comment in _skip_if_unavailable already promised existed. - Note that the sys.argv strip in test_views_xform_prim_fabric.py must stay between the AppLauncher import and its instantiation; any CLI parser or reordering re-exposes Kit to pytest argv and segfaults at startup. - Document the _fabric_usd_sync_done side effect on _compose_fabric_transform so callers can see why subsequent getters stop pulling from USD.
The class docstring and __init__ device-param doc still claimed ``cuda:0`` only. Refresh both to note that Fabric acceleration runs on any CUDA index, so the autodoc API page reflects the actual contract.
Move the test-fabric-multi-gpu job out of test-multi-gpu.yaml and into a dedicated test-fabric-multi-gpu.yaml. The two workflows share the same runner label, install step, and GPU pre-flight, but trigger on disjoint path sets so changes to FabricFrameView no longer gate the distributed-training validation and vice versa. test-multi-gpu.yaml is now byte-identical to upstream/develop.
c1254cd to
1c2e02d
Compare
Description
Removes the
cuda:0-only restriction inFabricFrameView. USDRTSelectPrimsnow accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g.,cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.The user-facing surface change is small (drops the device guard in
__init__, the assertion in_initialize_fabric, and the_fabric_supported_devicesallowlist). The remainder of the diff comprises:set_world_poses,set_scales, and the initial USD→Fabric sync now share a single_compose_fabric_transformhelper. The sync collapses to one combined kernel launch with onePrepareForReuse, eliminating the previous "doublePrepareForReuse" code smell where a non-idempotent second call could mask a topology-change signal._rebuild_fabric_arraysnow raisesRuntimeErrorinstead of usingassert, so the check survivespython -O. Previously, an-Orun with a real prim-count mismatch would silently dispatch with a stale_view_to_fabricmapping and produce wrong poses or out-of-bounds kernel indices.cuda:1-parameterized tests guarded by a newmulti_gpupytest marker (registered inpyproject.toml), plus a dedicated CI job on the existing multi-GPU runner so regressions show up automatically on PRs that touchFabricFrameViewor its test file.The skip-vs-fail logic in
_skip_if_unavailableis intentional and works in concert with the CI workflow:cuda:1testspytest.skipso local runs stay green.nvidia-smi+torch.cuda.device_count()check at the top oftest-fabric-multi-gpu. If the runner regresses to a single GPU, the pre-flight emits::error::and exits non-zero before pytest is even invoked, so a misconfigured runner cannot silently green-light a PR.The test helper used to special-case
GITHUB_ACTIONS=trueand callpytest.fail, but that branch was removed: the workflow pre-flight catches the same failure mode without coupling the test file toos.environ.Type of change
cuda:0continues to work exactly as before;cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists thereTest plan
Three new tests, all marked
@pytest.mark.multi_gpuand parameterized with["cuda:1"]:test_fabric_cuda1_world_pose_roundtrip—set_world_poses→get_world_posesreturns the same values on a non-primary CUDA device.test_fabric_cuda1_no_usd_writeback— Fabric writes oncuda:1do not write back to USD (atol=0.0— equality, not approximate).test_fabric_cuda1_scales_roundtrip— covers theset_scaleswrite path oncuda:1, since both Fabric write paths now run onself._device.A new CI job,
test-fabric-multi-gpu, runs in.github/workflows/test-multi-gpu.yamlon the existing[self-hosted, linux, x64, gpu, multi-gpu]runner. The job pre-flights withnvidia-smiand./isaaclab.sh -p -c "import torch; print(torch.cuda.device_count())", and fails loudly with::error::if the runner regresses to a single GPU. Triggered automatically on PRs that touchsource/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.pyor its test file.To verify locally on a multi-GPU machine:
./isaaclab.sh -p -m pytest -m multi_gpu \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -vTo verify the
cuda:0path is unchanged:./isaaclab.sh -p -m pytest -m "not multi_gpu" \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v